Using Syntactic Information To Identify Plagiarism

نویسندگان

  • Özlem Uzuner
  • Boris Katz
  • Thade Nahnsen
چکیده

Using keyword overlaps to identify plagiarism can result in many false negatives and positives: substitution of synonyms for each other reduces the similarity between works, making it difficult to recognize plagiarism; overlap in ambiguous keywords can falsely inflate the similarity of works that are in fact different in content. Plagiarism detection based on verbatim similarity of works can be rendered ineffective when works are paraphrased even in superficial and immaterial ways. Considering linguistic information related to creative aspects of writing can improve identification of plagiarism by adding a crucial dimension to evaluation of similarity: documents that share linguistic elements in addition to content are more likely to be copied from each other. In this paper, we present a set of low-level syntactic structures that capture creative aspects of writing and show that information about linguistic similarities of works improves recognition of plagiarism (over tfidf-weighted keywords alone) when combined with similarity measurements based on tfidf-weighted keywords.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PDLK: Plagiarism detection using linguistic knowledge

Plagiarism is described as the reuse of someone else’s previous ideas, work or even words without sufficient attribution to the source. This paper presents a method to detect external plagiarism using the integration of semantic relations between words and their syntactic composition. The problem with the available methods is that they fail to capture the meaning in comparison between a source ...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Identification of Plagiarism Using Syntactic and Semantic Filters

We present a work on detection of manual paraphrasing in documents in comparison with a set of source documents. Manual paraphrasing is a realistic type of plagiarism, where the obfuscation is introduced manually in documents. We have used PAN-PC-10 data set to develop and evaluate our algorithm. The proposed approach consists of two steps, namely, identification of probable plagiarized passage...

متن کامل

An introduction to the examples of scientific plagiarism and its identification soft-wares

Background: Increasing Immorality and Plagiarism in the country's higher education system has become a serious crisis. Hence, the purpose of this study was to analyze the Examples of Plagiarism and the introduction of Plagiarism detection software. Method: The present study is a narrative review study. Articles in Persian and Latin related to the use of scientific theft key words in databases w...

متن کامل

Automatic plagiarism detection using similarity analysis

Plagiarism involves reproducing the existing information in modified format or sometimes the original document as it is. This is quiet common among students, researchers and academicians. This has made some strong influence on research community and awareness among academic peoples to prevent such a kind of malpractice. Though there exits some commercial tools to detect plagiarism, still plagia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005